Setting Up a DMA Transfer

Setting Up a DMA Transfer

There are two issues in preparing a DMA transfer:

calculating physical addresses of the memory targets to be programmed into the device registers
ensuring cache coherency in a uniprocessor

The functions you use to derive target addresses are different for different bus adapters and are discussed in the following chapters:

The functions to set up DMA from a VME device are covered in Chapter 14, "Services for VME Drivers."
The functions to set up DMA from a SCSI device are covered in Chapter 15, "SCSI Device Drivers."
The functions to set up DMA from an EISA device are covered in Chapter 17, "EISA Device Drivers."
The functions to set up DMA from a GIO device are covered in Chapter 18, "GIO Device Drivers."

Note: In addition, when designing a device driver for use in a Challenge or Onyx system, be sure to read the caution in Appendix B, "Challenge DMA with Multiple IO4 Boards."

Converting Physical Addresses

General functions for calculating physical addresses are summarized in Table 9-12.

Functions Related to Physical Memory
Function Name Header Files Can Sleep? Purpose
kvtophys(D3) ddi.h N Get physical address of kernel data
sgset(D3) ddi.h & sg.h N Get physical addresses of a series of pages for simulated scatter/gather.

Functions Related to Physical Memory
Function Name	Header Files	Can Sleep?	Purpose
kvtophys(D3)	ddi.h	N	Get physical address of kernel data
sgset(D3)	ddi.h & sg.h	N	Get physical addresses of a series of pages for simulated scatter/gather.

Managing Buffer Virtual Addresses

Functions to manipulate buffer page mappings are summarized in Table 9-13.

Functions to Map Buffer Pages
Function Name Header Files Can Sleep? Purpose
bp_mapin(D3) buf.h Y Map buffer pages into kernel virtual address space.
bp_mapout(D3) buf.h N Release mapping of buffer pages.
bptophys(D3) ddi.h N Get physical address of buffer data.
clrbuf(D3) buf.h N Clear the memory described by a mapped-in buf_t.
getnextpg(D3) buf.h N Return pfdat structure for next page.
pptophys(D3) buf.h N Return the physical address of a page described by a pfdat structure.

Functions to Map Buffer Pages
Function Name	Header Files	Can Sleep?	Purpose
bp_mapin(D3)	buf.h	Y	Map buffer pages into kernel virtual address space.
bp_mapout(D3)	buf.h	N	Release mapping of buffer pages.
bptophys(D3)	ddi.h	N	Get physical address of buffer data.
clrbuf(D3)	buf.h	N	Clear the memory described by a mapped-in buf_t.
getnextpg(D3)	buf.h	N	Return pfdat structure for next page.
pptophys(D3)	buf.h	N	Return the physical address of a page described by a pfdat structure.

When a pfxstrategy() routine receives a buf_t that is not mapped into memory (see "Buffer Location and b_flags"), it must make sure that the pages of the buffer space are in memory, and it must obtain valid kernel virtual addresses to describe the pages. The simplest way is to apply the bp_mapin() function to the buf_t. This function allocates a contiguous range of page table entries in the kernel address space to describe the buffer, creating a mapping of the buffer pages to a contiguous range of kernel virtual addresses. It sets the virtual address of the first data byte in b_un.b_addr, and sets the flags so that BP_ISMAPPED() returns true--thus converting an unmapped buffer to a mapped case.

When the device does not handle scatter/gather DMA there is a disadvantage to using bp_mapin(). Without scatter/gather, each page's worth of data must be set up as a unique I/O operation. There is no need to have all of a possibly large buffer locked into memory for the whole time. Using getnextpg(), a driver can step through the pfdat structures that describe the successive pages of the buffer, calling pptophys() to get the physical address of the page frame.

Managing Memory for Cache Coherency

Some kernel functions used for ensuring cache coherency are summarized in Table 9-14.

Functions Related to Cache Coherency
Function Name Header Files Can Sleep? Purpose
dki_dcache_inval(D3) systm.h & types.h N Invalidate the data cache for a given range of virtual addresses.
dki_dcache_wb(D3) systm.h & types.h N Write back the data cache for a given range of virtual addresses.
dki_dcache_wbinval(D3) systm.h & types.h N Write back and invalidate the data cache for a given range of virtual addresses.
flushbus(D3) systm.h & types.h ? Make sure contents of the write buffer are flushed to the system bus

Functions Related to Cache Coherency
Function Name	Header Files	Can Sleep?	Purpose
dki_dcache_inval(D3)	systm.h & types.h	N	Invalidate the data cache for a given range of virtual addresses.
dki_dcache_wb(D3)	systm.h & types.h	N	Write back the data cache for a given range of virtual addresses.
dki_dcache_wbinval(D3)	systm.h & types.h	N	Write back and invalidate the data cache for a given range of virtual addresses.
flushbus(D3)	systm.h & types.h	?	Make sure contents of the write buffer are flushed to the system bus

The functions for cache invalidation are essential when doing DMA on a uniprocessor. They cost very little to use in a multiprocessor, so it does no harm to call them in every system. You call them as follows:

Call dki_dcache_inval() prior to doing DMA input. This ensures that when you refer to the received data, it will be loaded from real memory.
Call dki_dcache_wb() prior to doing DMA output. This ensures that the latest contents of cache memory are in system memory for the device to load.
Call dki_dcache_wbinval() prior to a device operation that samples memory and then stores new data.

The flushbus() function is needed because in some systems the hardware collects output data and writes it to the bus in blocks. When you write a small amount of data to a device through PIO, delay, then write again, the writes could be batched and sent to the device in quick succession. Use flushbus() after PIO output when it is followed by PIO input from the same device. Use it also between any two PIO outputs when the device is supposed to see a delay between outputs.

DMA Buffer Alignment

In some systems, the buffers used for DMA must be aligned on a boundary the size of a cache line in the current CPU. Although not all system architectures require cache alignment, it does no harm to use cache-aligned buffers in all cases. The size of a cache line varies among CPU models, but if you obtain a DMA buffer using the KMEM_CACHEALIGN flag of kmem_alloc(), the buffer is properly aligned. The buffer returned by geteblk() (see "Allocating buf_t Objects and Buffers") is cache-aligned.

Why is cache alignment necessary? Suppose you have a variable, X, adjacent to a buffer you are going to use for DMA write. If you invalidate the buffer prior to the DMA write, but then reference the variable X, the resulting cache miss brings part of the buffer back into the cache. When the DMA write completes, the cache is stale with respect to memory. If, however, you invalidate the cache after the DMA write completes, you destroy the value of the variable X.

Maximum DMA Transfer Size

The maximum size for a single DMA transfer is set by the system tuning variable maxdmasz, settable with the systune command (see the systune(1) reference page). A single I/O operation larger than this produces the error ENOMEM.

The unit of measure for maxdmasz is the page, which varies with the kernel. Under IRIX 6.2, a 32-bit kernel uses 4 KB pages while a 64-bit kernel uses 16 KB pages. In both systems, maxdmasz is shipped with the value 1024 decimal, equivalent to 4 MB in a 32-bit kernel and 16 MB in a 64-bit kernel.

In Challenge and Onyx systems, maxdmasz can be set as high as 64 MB. However, it is not usually possible to allocate a DMA map for a single transfer that large--see "Mapping DMA Addresses".